智能论文笔记

Dynamic Adaptive Threshold based Learning for Noisy Annotations Robust Facial Expression Recognition

Darshan Gera , Naveen Siva Kumar Badveeti , Bobbili Veerendra Raj Kumar , S Balasubramanian

分类：计算机视觉 | 人工智能

2022-08-22

现实世界的面部表达识别（FER）数据集遭受吵闹的注释，由于众包，表达式的歧义，注释者的主观性和类间的相似性。但是，最近的深层网络具有强大的能力，可以记住嘈杂的注释导致腐蚀功能嵌入和泛化不良的能力。为了处理嘈杂的注释，我们提出了一个动态FER学习框架（DNFER），其中根据训练过程中的动态类特定阈值选择了干净的样品。具体而言，DNFER基于使用选定的干净样品和使用所有样品的无监督培训的监督培训。在训练过程中，每个微型批次的平均后类概率被用作动态类特异性阈值，以选择干净的样品进行监督训练。该阈值与噪声率无关，与其他方法不同，不需要任何干净的数据。此外，要从所有样品中学习，使用无监督的一致性损失对齐弱调节图像和强大图像之间的后验分布。我们证明了DNFER在合成和实际噪声注释的FER数据集（如RaFDB，Ferplus，Sfew和altimpnet）上的鲁棒性。

translated by 谷歌翻译

SS-MFAR : Semi-supervised Multi-task Facial Affect Recognition

Darshan Gera , Badveeti Naveen Siva Kumar , Bobbili Veerendra Raj Kumar , S Balasubramanian

分类：计算机视觉

2022-07-19

自动情感识别在许多领域都有应用，例如教育，游戏，软件开发，汽车，医疗保健等。但是，在野外数据集上实现可观的绩效是无琐的任务。野外数据集虽然比合成数据集更好地代表了现实世界中的情况，但前者遇到了不完整标签的问题。受到半监督学习的启发，在本文中，我们在第四次情感行为分析（ABAW）2022竞赛中介绍了提交的多任务学习挑战。在这项挑战中考虑的三个任务是价估计（VA）估计，表达式分为6个基本（愤怒，厌恶，恐惧，幸福，悲伤，惊喜），中立和“其他”类别和12个行动单位（au）编号au - \ {1,2,4,6,7,10,12,15,15,23,24,25,26 \}。我们的方法半监督的多任务面部情感情感识别标题为\ textbf {ss-mfar}使用一个深层残留网络，每个任务都具有特定任务分类器以及每个表达式类别的自适应阈值，每个表达式类别和半监督学习。源代码可从https://github.com/1980x/abaw20222dmacs获得。

translated by 谷歌翻译

Design of an All-Purpose Terrace Farming Robot

Vibhakar Mohta , Adarsh Patnaik , Shivam Kumar Panda , Siva Vignesh Krishnan , Abhinav Gupta , Abhay Shukla , Gauri Wadhwa , Shrey Verma , Aditya Bandopadhyay

分类：机器人

2022-12-04

Automation in farming processes is a growing field of research in both academia and industries. A considerable amount of work has been put into this field to develop systems robust enough for farming. Terrace farming, in particular, provides a varying set of challenges, including robust stair climbing methods and stable navigation in unstructured terrains. We propose the design of a novel autonomous terrace farming robot, Aarohi, that can effectively climb steep terraces of considerable heights and execute several farming operations. The design optimisation strategy for the overall mechanical structure is elucidated. Further, the embedded and software architecture along with fail-safe strategies are presented for a working prototype. Algorithms for autonomous traversal over the terrace steps using the scissor lift mechanism and performing various farming operations have also been discussed. The adaptability of the design to specific operational requirements and modular farm tools allow Aarohi to be customised for a wide variety of use cases.

translated by 谷歌翻译

Wirelessly-Controlled Untethered Piezoelectric Planar Soft Robot Capable of Bidirectional Crawling and Rotation

Zhiwu Zheng , Hsin Cheng , Prakhar Kumar , Sigurd Wagner , Minjie Chen , Naveen Verma , James C. Sturm

分类：机器人

2022-07-01

静电执行器为创建软机器人板提供了一种有希望的方法，因为它们的柔性外形，模块化集成和快速响应速度。但是，它们的控制需要千伏信号，并理解由板上和环境效应的力相互作用引起的复杂动力学。在这项工作中，我们演示了一个不受限制的二维五实机压电机器人，该机器人由电池和板载高压电路提供动力，并通过无线链路进行控制。可扩展的制造方法基于彼此之间的键合化层（钢箔底物，执行器，柔性电子设备）。机器人表现出一系列可控运动，包括双向爬行（高达〜0.6 cm/s），转弯和现场旋转（约1度/s）。高速视频和控制实验表明，运动的丰富性是由于机器人中不对称质量分布的相互作用以及动力学对压电驱动频率的相关依赖性。

translated by 谷歌翻译

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Aarohi Srivastava , Abhinav Rastogi , Abhishek Rao , Abu Awal Md Shoeb , Abubakar Abid , Adam Fisch , Adam R. Brown , Adam Santoro , Aditya Gupta , Adrià Garriga-Alonso

分类：自然语言处理 | 人工智能 | 机器学习 | (统计)机器学习

2022-06-09

语言模型既展示了定量的改进，又展示了新的定性功能，随着规模的增加。尽管它们具有潜在的变革性影响，但这些新能力的特征却很差。为了为未来的研究提供信息，为破坏性的新模型能力做准备，并改善社会有害的效果，至关重要的是，我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战，我们介绍了超越模仿游戏基准（Big Bench）。 Big Bench目前由204个任务组成，由132家机构的442位作者贡献。任务主题是多样的，从语言学，儿童发展，数学，常识性推理，生物学，物理学，社会偏见，软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号，Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为，跨越了数百万到数十亿个参数。此外，一个人类专家评估者团队执行了所有任务，以提供强大的基准。研究结果包括：模型性能和校准都随规模改善，但绝对的术语（以及与评估者的性能相比）；在模型类中的性能非常相似，尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分，而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标；社交偏见通常会随着含糊不清的环境而随着规模而增加，但这可以通过提示来改善。

translated by 谷歌翻译

Biomarker Gene Identification for Breast Cancer Classification

Sheetal Rajpal , Ankit Rajpal , Manoj Agarwal , Naveen Kumar

分类：机器学习

2021-11-10

背景：乳腺癌被出现为妇女中最普遍的癌症之一，导致高死亡率。由于乳腺癌的异质性质，需要鉴定与乳腺癌亚型相关的差异表达基因，以便及时诊断和治疗。目的：鉴定为其签名的四个乳腺癌亚型中每种患有的小基因，本文提出了一种基因签名识别的新算法。方法：本作本作采用可解释的AI方法来研究用于使用TCGA乳腺癌RNA序列数据鉴定生物标志物的亚型神经网络对亚型分类进行的预测。结果：所提出的算法导致了一组43个差异表达基因签名的发现。我们使用神经网络分类器实现了0.91的竞争性平均10倍。此外，基因设定分析显示了若干相关途径，例如ERBB2和P53信号传导途径的GRB7事件。使用Pearson相关矩阵，我们注意到亚型特异性基因在每个亚型内相关。结论：提出的技术使我们能够找到一套简洁和临床相关的基因签名集。

translated by 谷歌翻译

Deep Learning Based Model for Breast Cancer Subtype Classification

Sheetal Rajpal , Virendra Kumar , Manoj Agarwal , Naveen Kumar

分类：机器学习

2021-11-06

乳腺癌长期以来一直是女性死亡率的着名原因。现在，由于能够记录基因表达数据的RNA测序工具的可用性，现在可以进行诊断，治疗和预后。分子亚型与设计设计有关的临床策略和预后密切相关，本文侧重于使用基因表达数据进行乳腺癌分类为四个亚型，即基础，HER2，亮度和叶。在第1阶段，我们建议了一个基于深度学习的模型，它使用AutoEncoder来减少维度。通过使用AutoEncoder，特征集的大小从20,530个基因表达值减少到500。这种编码的表示被传递给第二阶段的深神经网络，用于将患者分为四个分子癌的四种分子亚型。通过部署阶段1和2的组合网络，我们能够在TCGA乳腺癌数据集上获得0.907的平均10倍测试精度。在整个10个不同的运行过程中，所提出的框架相当强劲，如Boxplot用于分类准确性所示。与文献中报告的相关工作相比，我们取得了竞争的结果。总之，所提出的两级深度学习的模型能够准确地分类四个乳腺癌亚型，突出了自动化的能力推导了紧凑的表现和神经网络分类器正确标记乳腺癌患者的能力。

translated by 谷歌翻译

Piezoelectric Soft Robot Inchworm Motion by Controlling Ground Friction through Robot Shape

Zhiwu Zheng , Prakhar Kumar , Yenan Chen , Hsin Cheng , Sigurd Wagner , Minjie Chen , Naveen Verma , James C. Sturm

分类：机器人

2021-11-01

电驱动的软机器人能够实现小型和灯体，以及环境兼容性，各种运动和安全操作。特别地，静电致动器（例如，压电致动器）快速响应。但是，可扩展的无缝集成和不可阻止操作的方法仍不清楚。此外，软体自然建模，包括环境互动，是一个长期存在的挑战。此外，需要探索更多的机器机制。在本文中，我们设计了模型，建模并展示了一个软机器人，这是第一次开始解决所有这些问题。它具有平面结构的五个执行器的线性阵列，用于集成和自由操作的开门。通过依靠姿势自我调整，设计和验证了一种新的九寸式捕获的爬行运动机制。通过实验开发并验证了包括井解释机器人运动的压电，重力和地面相互作用的第一分析软体模型。我们展示了机器人的前向和向后运动，并探索了有效载荷和驾驶速度的影响：每循环的1.2 mm运动，在移动时可以携带高达200克的有效载荷（16倍体重）。这项工作为复杂的未知环境中的快速响应机器人铺平了道路。

translated by 谷歌翻译

e-Inu: Simulating A Quadruped Robot With Emotional Sentience

Abhiruph Chakravarty , Jatin Karthik Tripathy , Sibi Chakkaravarthy S , Aswani Kumar Cherukuri , S. Anitha , Firuz Kamalov , Annapurna Jonnalagadda

分类：机器人 | 机器学习

2023-01-03

Quadruped robots are currently used in industrial robotics as mechanical aid to automate several routine tasks. However, presently, the usage of such a robot in a domestic setting is still very much a part of the research. This paper discusses the understanding and virtual simulation of such a robot capable of detecting and understanding human emotions, generating its gait, and responding via sounds and expression on a screen. To this end, we use a combination of reinforcement learning and software engineering concepts to simulate a quadruped robot that can understand emotions, navigate through various terrains and detect sound sources, and respond to emotions using audio-visual feedback. This paper aims to establish the framework of simulating a quadruped robot that is emotionally intelligent and can primarily respond to audio-visual stimuli using motor or audio response. The emotion detection from the speech was not as performant as ERANNs or Zeta Policy learning, still managing an accuracy of 63.5%. The video emotion detection system produced results that are almost at par with the state of the art, with an accuracy of 99.66%. Due to its "on-policy" learning process, the PPO algorithm was extremely rapid to learn, allowing the simulated dog to demonstrate a remarkably seamless gait across the different cadences and variations. This enabled the quadruped robot to respond to generated stimuli, allowing us to conclude that it functions as predicted and satisfies the aim of this work.

translated by 谷歌翻译

NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory

Santhosh Kumar Ramakrishnan , Ziad Al-Halah , Kristen Grauman

分类：计算机视觉

2023-01-02

Searching long egocentric videos with natural language queries (NLQ) has compelling applications in augmented reality and robotics, where a fluid index into everything that a person (agent) has seen before could augment human memory and surface relevant information on demand. However, the structured nature of the learning problem (free-form text query inputs, localized video temporal window outputs) and its needle-in-a-haystack nature makes it both technically challenging and expensive to supervise. We introduce Narrations-as-Queries (NaQ), a data augmentation strategy that transforms standard video-text narrations into training data for a video query localization model. Validating our idea on the Ego4D benchmark, we find it has tremendous impact in practice. NaQ improves multiple top models by substantial margins (even doubling their accuracy), and yields the very best results to date on the Ego4D NLQ challenge, soundly outperforming all challenge winners in the CVPR and ECCV 2022 competitions and topping the current public leaderboard. Beyond achieving the state-of-the-art for NLQ, we also demonstrate unique properties of our approach such as gains on long-tail object queries, and the ability to perform zero-shot and few-shot NLQ.

translated by 谷歌翻译